Welcome to the SIB Days 2020 - virtual conference Spatial Transcriptomics workshop by 10x genomics!
The purpose of this tutorial will be to walk users through some of the steps necessary to explore data produced by the 10x Genomics Visium Spatail Gene Expression Solution and the Spaceranger pipeline. We will investigate the datasets whith are all freely available from 10x Genomics.
Things to know about this workshop
/mnt/libs/shared_data/library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
Real Dataset for the tutorial
mouse_brain_sa <- Load10X_Spatial(data.dir = "/mnt/libs/shared_data/mouse_brain_sa/outs/",
filename = "V1_Mouse_Brain_Sagittal_Anterior_filtered_feature_bc_matrix.h5")
Same data just internal to 10x
mouse_brain_sa <- Load10X_Spatial(data.dir = "/mnt/analysis/marsoc/pipestances/HMKLFDMXX/SPATIAL_RNA_COUNTER_PD/160121/HEAD/outs/")
There are a bunch of datasets hoted by the Satija lab in the Seurat Data Package.
Let’s have a look at some basic QC information. Keep in mind that most seurat plots are ggplot object and can be manipulated as such.
Counts = UMI Features = Genes
plot1 <- VlnPlot(mouse_brain_sa, features = "nCount_Spatial", pt.size = 0.1) +
ggtitle("UMI") +
theme(axis.text.x = element_blank(), axis.title.x = element_blank(), legend.position = "right") +
NoLegend()
plot2 <- VlnPlot(mouse_brain_sa, features = "nFeature_Spatial", pt.size = 0.1) +
ggtitle("Genes") +
theme(axis.text.x = element_blank(), axis.title.x = element_blank(), legend.position = "right") +
NoLegend()
plot3 <- SpatialFeaturePlot(mouse_brain_sa, features = "nCount_Spatial") +
theme(legend.position = "right")
plot4 <- SpatialFeaturePlot(mouse_brain_sa, features = "nFeature_Spatial") +
theme(legend.position = "right")
plot1 + plot2 + plot3 + plot4 + plot_layout(nrow = 2, ncol = 2)
Spaceranger does normiliaztion for clustering and DE but does not return that normalized matrix
Pre-normalization Raw UMI counts
SE transform
Don’t worry about reachediteration limit warnings
Default assay will now be set to SCT
mouse_brain_sa <- SCTransform(mouse_brain_sa, assay = "Spatial", verbose = TRUE)
From Seurat:
The default parameters in Seurat emphasize the visualization of molecular data. However, you can also adjust the size of the spots (and their transparency) to improve the visualization of the histology image, by changing the following parameters:
pt.size.factor- This will scale the size of the spots. Default is 1.6
alpha - minimum and maximum transparency. Default is c(1, 1).
Try setting to alpha c(0.1, 1), to downweight the transparency of points with lower expression
p1 <- SpatialFeaturePlot(mouse_brain_sa, features = "Ttr", pt.size.factor = 1)+
theme(legend.position = "right") +
ggtitle("Actual Spot Size")
p2 <- SpatialFeaturePlot(mouse_brain_sa, features = "Ttr", alpha = c(0.1, 1))+
theme(legend.position = "right") +
ggtitle("Scaled Spot Size")
p1 + p2
Dimensionality reduction, clustering, and visualization
We can then proceed to run dimensionality reduction and clustering on the RNA expression data, using the same workflow as we use for scRNA-seq analysis.
Some of these processes can be parallized
library(future)
# check the current active plan
plan()
sequential:
- args: function (expr, envir = parent.frame(), substitute = TRUE, lazy = FALSE, seed = NULL, globals = TRUE, local = TRUE, earlySignal = FALSE, label = NULL, ...)
- tweaked: FALSE
- call: NULL
# change the current plan to access parallelization
plan("multiprocess", workers = 4)
[ONE-TIME WARNING] Forked processing ('multicore') is disabled in future (>= 1.13.0) when running R from RStudio, because it is considered unstable. Because of this, plan("multicore") will fall back to plan("sequential"), and plan("multiprocess") will fall back to plan("multisession") - not plan("multicore") as in the past. For more details, how to control forked processing or not, and how to silence this warning in future R sessions, see ?future::supportsMulticore
plan()
multiprocess:
- args: function (expr, envir = parent.frame(), substitute = TRUE, lazy = FALSE, seed = NULL, globals = TRUE, workers = 4, gc = FALSE, earlySignal = FALSE, label = NULL, ...)
- tweaked: TRUE
- call: plan("multiprocess", workers = 4)
mouse_brain_sa <- RunPCA(mouse_brain_sa, assay = "SCT", verbose = FALSE)
mouse_brain_sa <- FindNeighbors(mouse_brain_sa, reduction = "pca", dims = 1:30)
Computing nearest neighbor graph
Computing SNN
mouse_brain_sa <- FindClusters(mouse_brain_sa, verbose = FALSE)
mouse_brain_sa <- RunUMAP(mouse_brain_sa, reduction = "pca", dims = 1:30)
The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session15:46:04 UMAP embedding parameters a = 0.9922 b = 1.112
15:46:04 Read 2696 rows and found 30 numeric columns
15:46:04 Using Annoy for neighbor search, n_neighbors = 30
15:46:04 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:46:04 Writing NN index file to temp file /tmp/RtmpdiyDPO/file86f2c25794501
15:46:04 Searching Annoy index using 4 threads, search_k = 3000
15:46:04 Annoy recall = 100%
15:46:05 Commencing smooth kNN distance calibration using 4 threads
15:46:06 Initializing from normalized Laplacian + noise
15:46:08 Commencing optimization for 500 epochs, with 105850 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:46:15 Optimization finished
Now let’s have a look at the clustering
I don’t really like these colors so let’s change them
p1 <- DimPlot(mouse_brain_sa, reduction = "umap", label = TRUE) +
labs(color = "Cluster")
p2 <- SpatialDimPlot(mouse_brain_sa, label = TRUE, label.size = 3) +
labs(fill = "Cluster")
p1 + p2
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
Error in brewer.pal(11, "Spectral") :
could not find function "brewer.pal"
Interactivity not working for me on firefox
Need to upgrade my seurat
SpatialFeaturePlot(mouse_brain_sa, features = "Ttr", do.hover = TRUE)
LinkedDimPlot(mouse_brain_sa)
First we’ll idetify differentially expressed genes.
Parallelization helps here too let’s make sure our plan is still intact
plan()
multiprocess:
- args: function (expr, envir = parent.frame(), substitute = TRUE, lazy = FALSE, seed = NULL, globals = TRUE, workers = 4, gc = FALSE, earlySignal = FALSE, label = NULL, ...)
- tweaked: TRUE
- call: plan("multiprocess", workers = 4)
- call: plan("multiprocess", workers = 4) indicates that it is
Looks like we have some very DE genes for clusters 4 and 11
clarify what ident.1 = 4, ident.2 = 6 are for
de_markers <- FindMarkers(mouse_brain_sa, ident.1 = 4, ident.2 = 6)
what are the top variable features?
VariableFeatures(mouse_brain_sa)[1:10]
[1] "Ttr" "Plp1" "Hba-a1" "Hbb-bs" "Mbp" "Penk" "Ptgds" "Hba-a2" "S100a5" "Ppp1r1b"
what are the top de genes?
rownames(de_markers)[1:10]
[1] "Calb2" "Slc6a11" "Ckb" "Camk2n1" "Doc2g" "Camk2a" "Th" "Nrgn" "Cdhr1" "Slc1a2"
Using the top 100 variable genes find spatially enriched ones This will take X minutes something isn’t right here taking way too long
mouse_brain_sa <- FindSpatiallyVariableFeatures(mouse_brain_sa, assay = "SCT", features = VariableFeatures(mouse_brain_sa)[1:100],
selection.method = "markvariogram", verbose = TRUE)
images_cl <- list()
for (i in 1:length(lenas)) {
images_cl[[i]] <- readbitmap::read.bitmap(paste(get_spatial_path(lenas[i]), "/outs/spatial/tissue_lowres_image.png", sep = ""))
}
height <- list()
for (i in 1:length(lenas)) {
height[[i]] <- data.frame(height = nrow(images_cl[[i]]))
}
height <- bind_rows(height)
width <- list()
for (i in 1:length(lenas)) {
width[[i]] <- data.frame(width = ncol(images_cl[[i]]))
}
width <- bind_rows(width)
###Color from pipeline
grobs <- list()
for (i in 1:length(lenas)) {
grobs[[i]] <- grid::rasterGrob(images_cl[[i]], width=unit(1,"npc"), height=unit(1,"npc"))
}
images_tibble <- tibble(lena=lenas, grob=grobs)
images_tibble$lena <- factor(images_tibble$lena)
images_tibble$height <- height$height
images_tibble$width <- width$width
images_tibble
scales <- list()
for (i in 1:length(lenas)) {
path_scales <- paste(svenLib::get_spatial_path(lenas[i]), "/outs/spatial/scalefactors_json.json", sep = "")
scales[[i]] <- rjson::fromJSON(file = path_scales)
}
clusters <- list()
for (i in 1:length(lenas)) {
clusters[[i]] <- read.csv(paste(svenLib::get_spatial_path(lenas[i]),"/outs/analysis_csv/clustering/graphclust/clusters.csv", sep = ""))
}
tsne <- list()
for (i in 1:length(lenas)) {
tsne[[i]] <- read.csv(paste(svenLib::get_spatial_path(lenas[i]),"/outs/analysis_csv/tsne/2_components/projection.csv",sep = ""), header = T)
}
umap <- list()
for (i in 1:length(lenas)) {
umap[[i]] <- read.csv(paste(svenLib::get_spatial_path(lenas[i]),"/outs/analysis_csv/umap/2_components/projection.csv",sep = ""), header = T)
}
umap[[1]]
bcs <- list()
for (i in 1:length(lenas)) {
if (file.exists(paste(get_spatial_path(lenas[i]),"/outs/spatial/tissue_positions_list.txt", sep = ""))) {
bcs[[i]] <- read.csv(paste(get_spatial_path(lenas[i]), "/outs/spatial/tissue_positions_list.txt", sep = ""),
col.names=c("barcode","tissue","row","col","imagerow","imagecol"), header = F)
} else {
bcs[[i]] <- read.csv(paste(get_spatial_path(lenas[i]), "/outs/spatial/tissue_positions_list.csv", sep = ""),
col.names=c( "barcode","tissue","row","col","imagerow","imagecol"), header = F)
}
bcs[[i]]$imagerow_scaled <- bcs[[i]]$imagerow * scales[[i]]$tissue_lowres_scalef # scale tissue coordinates for lowres image
bcs[[i]]$imagecol_scaled <- bcs[[i]]$imagecol * scales[[i]]$tissue_lowres_scalef
bcs[[i]]$imagerow_scaled_round <- round(bcs[[i]]$imagerow * scales[[i]]$tissue_lowres_scalef) # Rounded scales
bcs[[i]]$imagecol_scaled_round <- round(bcs[[i]]$imagecol * scales[[i]]$tissue_lowres_scalef)
bcs[[i]]$tissue <- as.factor(bcs[[i]]$tissue)
bcs[[i]] <- merge(bcs[[i]], clusters[[i]], by.x = "barcode", by.y = "Barcode", all = TRUE)
bcs[[i]] <- merge(bcs[[i]], tsne[[i]], by.x = "barcode", by.y = "Barcode", all = TRUE)
bcs[[i]] <- merge(bcs[[i]], umap[[i]], by.x = "barcode", by.y = "Barcode", all = TRUE)
bcs[[i]]$height <- height$height[i]
bcs[[i]]$width <- width$width[i]
}
names(bcs) <- lenas
read_matrix <- function(sid) {
Matrix::t(Seurat::Read10X_h5(paste(get_spatial_path(sid = sid), "/outs/raw_feature_bc_matrix.h5", sep = "")))
}
matrix <- map(.x = lenas, .f = read_matrix)
names(matrix) <- lenas
matrix[[1]]
umi_sum <- list()
for (i in 1:length(lenas)) {
umi_sum[[i]] <- data.frame(barcode = row.names(matrix[[i]]),
sum_umi = Matrix::rowSums(matrix[[i]]))
}
names(umi_sum) <- lenas
umi_sum <- bind_rows(umi_sum, .id = "lena")
umi_sum
gene_sum <- list()
for (i in 1:length(lenas)) {
gene_sum[[i]] <- data.frame(barcode = row.names(matrix[[i]]),
sum_gene = Matrix::rowSums(matrix[[i]] != 0))
}
names(gene_sum) <- lenas
gene_sum <- bind_rows(gene_sum, .id = "lena")
gene_sum
# If you need to look at the correlation of gene expression between samples
gene_umi_sum <- list()
for (i in 1:length(lenas)) {
gene_umi_sum[[i]] <- data.frame(gene = colnames(matrix[[i]]),
gene_umi_sum = Matrix::colSums(matrix[[i]]))
}
names(gene_umi_sum) <- lenas
gene_umi_sum <- bind_rows(gene_umi_sum, .id = "lena")
gene_umi_sum
bcs_merge <- bind_rows(bcs, .id = "lena")
bcs_merge <- merge(bcs_merge,umi_sum, by = c("barcode", "lena"))
bcs_merge <- merge(bcs_merge,gene_sum, by = c("barcode", "lena"))
Define our color palette for plotting
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
plots <- list()
for (i in 1:length(lenas)) {
plots[[i]] <- bcs_merge %>%
filter(lena ==lenas[i]) %>%
filter(tissue =="1") %>%
ggplot(aes(x=imagecol_scaled,y=imagerow_scaled,fill=sum_umi)) +
geom_spatial(data=images_tibble[i,], aes(grob=grob), x=0.5, y=0.5)+
geom_point(shape = 21, colour = "black", size = 2, stroke = 0.1)+
coord_cartesian(expand=FALSE)+
scale_fill_gradientn(colours = myPalette(100))+
#facet_wrap(~lena)+
xlim(0,max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(width)))+
ylim(max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(height)),0)+
xlab("") +
ylab("") +
ggtitle(paste(lenas[i],": ", sample_type[i], sep = ""))+
labs(fill = "UMI")+
theme_set(theme_bw(base_size = 10))+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"),
axis.text = element_blank())
}
patchwork::wrap_plots(plots)
plots <- list()
for (i in 1:length(lenas)) {
plots[[i]] <- bcs_merge %>%
filter(lena ==lenas[i]) %>%
filter(tissue =="1") %>%
ggplot(aes(x=imagecol_scaled,y=imagerow_scaled,fill=sum_gene)) +
geom_spatial(data=images_tibble[i,], aes(grob=grob), x=0.5, y=0.5)+
geom_point(shape = 21, colour = "black", size = 2, stroke = 0.1)+
coord_cartesian(expand=FALSE)+
scale_fill_gradientn(colours = myPalette(100))+
#facet_wrap(~lena)+
xlim(0,max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(width)))+
ylim(max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(height)),0)+
xlab("") +
ylab("") +
ggtitle(paste(lenas[i],": ", sample_type[i], sep = ""))+
labs(fill = "Genes")+
theme_set(theme_bw(base_size = 10))+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"),
axis.text = element_blank())
}
patchwork::wrap_plots(plots)
plots <- list()
for (i in 1:length(lenas)) {
plots[[i]] <- bcs_merge %>%
filter(lena ==lenas[i]) %>%
add_column(GAPDH = matrix[[i]][,"GAPDH"]) %>%
ggplot(aes(x=imagecol_scaled,y=imagerow_scaled,fill=GAPDH)) +
geom_spatial(data=images_tibble[i,], aes(grob=grob), x=0.5, y=0.5)+
geom_point(shape = 21, colour = "black", size = 2, stroke = 0.1)+
coord_cartesian(expand=FALSE)+
scale_fill_gradientn(colours = myPalette(100))+
#facet_wrap(~lena)+
xlim(0,max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(width)))+
ylim(max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(height)),0)+
xlab("") +
ylab("") +
ggtitle(paste(lenas[i],": ", sample_type[i], sep = ""))+
labs(fill = "GAPDH UMI")+
theme_set(theme_bw(base_size = 10))+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"),
axis.text = element_blank())
}
patchwork::wrap_plots(plots)
plots <- list()
for (i in 1:length(lenas)) {
plots[[i]] <- bcs_merge %>%
filter(lena ==lenas[i]) %>%
filter(tissue == "1") %>%
na.omit() %>%
ggplot(aes(x=imagecol_scaled,y=imagerow_scaled,fill=factor(Cluster))) +
geom_spatial(data=images_tibble[i,], aes(grob=grob), x=0.5, y=0.5)+
geom_point(shape = 21, colour = "black", size = 2, stroke = 0.1)+
coord_cartesian(expand=FALSE)+
scale_fill_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "white", "purple", "brown"))+
xlim(0,max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(width)))+
ylim(max(bcs_merge %>%
filter(lena ==lenas[i]) %>%
dplyr::select(height)),0)+
xlab("") +
ylab("") +
ggtitle(paste(lenas[i],": ", sample_type[i], sep = ""))+
labs(fill = "Cluster")+
guides(fill = guide_legend(override.aes = list(size=3)))+
theme_set(theme_bw(base_size = 10))+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"),
axis.text = element_blank())
}
patchwork::wrap_plots(plots)
10x Genomics, patrick.roelli@10xgenomics.com ↩︎
10x Genomics, stephen.williams@10xgenomics.com↩︎
10x Genomics, stephen.williams@10xgenomics.com↩︎